Save all volatile argument registers in GC probe hijack frames on AMD64 and ARM64#126848
Save all volatile argument registers in GC probe hijack frames on AMD64 and ARM64#126848
Conversation
… frames On AMD64, when the GC hijacks a thread via RhpGcProbeHijack, the hijack stub creates a PInvokeTransitionFrame. Previously on Windows, only RAX and RCX were saved, and on Unix only RAX, RCX, and RDX were saved. If the GC info for the managed frame at the hijack point reports other volatile registers (RDX, R8, R9) as live GC references, the StackFrameIterator would find NULL save locations and crash (AV in SVR::GCHeap::Promote). Windows changes: - FixupHijackedCallstack now uses R10/R11 for thread pointer instead of RDX/R8, preserving all volatile argument registers (RAX, RCX, RDX, R8, R9) - PUSH_PROBE_FRAME/POP_PROBE_FRAME extended to save/restore RDX, R8, R9 - RhpGcProbeHijack flags include PTFF_SAVE_RDX + PTFF_SAVE_R8 + PTFF_SAVE_R9 - AsmMacros.inc gains PTFF_SAVE_RDX, PTFF_SAVE_R8, PTFF_SAVE_R9 definitions Unix changes: - FixupHijackedCallstack now also saves/restores R8 and R9 across GETTHREAD - Uses R10 instead of R8 for scratch in hijack fixup and bitmask passing - PUSH_PROBE_FRAME/POP_PROBE_FRAME extended to save/restore R8, R9 - RhpGcProbeHijack flags include PTFF_SAVE_R8 + PTFF_SAVE_R9 - unixasmmacrosamd64.inc gains PTFF_SAVE_R8, PTFF_SAVE_R9 definitions Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/411be3e7-43a1-4bc3-b896-e27d92fe37c3 Co-authored-by: mangod9 <61718172+mangod9@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/411be3e7-43a1-4bc3-b896-e27d92fe37c3 Co-authored-by: mangod9 <61718172+mangod9@users.noreply.github.com>
On Unix System V ABI, RSI and RDI are volatile argument-passing registers (1st and 2nd integer args) that could contain live GC references at hijack points. On Windows, they are already saved as callee-saved registers in PTFF_SAVE_ALL_PRESERVED. Unix changes: - FixupHijackedCallstack now saves/restores RSI and RDI across GETTHREAD - PUSH_PROBE_FRAME saves RSI and RDI between R12 and RBX, matching the StackFrameIterator's flag processing order (RBX, RSI, RDI, R12...) - POP_PROBE_FRAME restores RSI and RDI in correct order - RhpGcProbeHijack flags include PTFF_SAVE_RSI + PTFF_SAVE_RDI - unixasmmacrosamd64.inc gains PTFF_SAVE_RSI and PTFF_SAVE_RDI definitions Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/85ca9ec1-057b-4a3c-8182-e6da274d44b6 Co-authored-by: mangod9 <61718172+mangod9@users.noreply.github.com>
IIRC, we have only discussed preserving argument registers in #122639. None of the asm helpers are not setup to save all argument registers. Is it a problem to restrict the JIT to only assume argument registers are preserved around the resolve helper? |
|
Also, I assume that the JIT expects floating point argument registers to be preserved. Is that right? The helper changed in this PR is only preserving xmm0. It should be fixed to preserve all arg registers. |
No, we can just update @copilot Please update this PR to change
Yes, that is right. |
@copilot Please update this PR to preserve floating point argument registers in these asm helpers. |
…gisters in probe frames Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/2bbc56c9-47e8-42f8-903e-51a79bbb96c6 Co-authored-by: jakobbotsch <7887810+jakobbotsch@users.noreply.github.com>
Done in b1486c3.
FP argument registers: Updated all GcProbe files to save/restore all FP arg registers:
Fixed in b1486c3. All FP arg registers are now preserved: xmm0-3 (Windows AMD64), xmm0-7 (Unix AMD64), q0-q7 (ARM64). On Unix, they're also saved/restored across |
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/2bbc56c9-47e8-42f8-903e-51a79bbb96c6 Co-authored-by: jakobbotsch <7887810+jakobbotsch@users.noreply.github.com>
…egisters including float Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/24a8bc8a-0f2f-49c9-b96a-268d92070834 Co-authored-by: jakobbotsch <7887810+jakobbotsch@users.noreply.github.com>
... Already done in b1486c3. All FP arg registers are preserved: xmm0-3 (Windows AMD64), xmm0-7 (Unix AMD64), q0-q7 (ARM64). On Unix, they're also saved/restored across |
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
CrossGen2 crashes with AV in
SVR::GCHeap::Promotewhen the GC hijacks a thread at a point where volatile argument registers contain live GC references. The hijack stub'sPInvokeTransitionFramedidn't save these registers, soStackFrameIteratorfound NULL save locations when GC info reported them as live.Windows AMD64:
FixupHijackedCallstackused RDX for the thread pointer and R8 as scratch, destroying their original values. Only RAX/RCX were saved. Only xmm0 was preserved for floating point.Unix AMD64: RSI, RDI, R8, and R9 were destroyed by the
INLINE_GETTHREADfunction call and never saved. On the System V ABI, RSI and RDI are volatile argument-passing registers (1st and 2nd integer args) that could contain live GC references at hijack points. On Windows, RSI/RDI are already covered as callee-saved registers inPTFF_SAVE_ALL_PRESERVED. Only xmm0/xmm1 were preserved for floating point.ARM64 (Windows and Unix):
FixupHijackedCallstackused x3 as trash and x4 for the thread pointer, destroying their original values. Only x0-x2 were saved in the probe frame. Volatile argument registers x3-x7 could contain live GC references at hijack points but were not preserved. Only q0-q3 were preserved for floating point.JIT interface resolution:
RBM_INTERFACELOOKUP_FOR_SLOT_TRASHdid not correctly indicate that all non-argument volatile registers (including floating point and mask registers) are trashed, which could cause the JIT to assume non-argument volatile registers survive the resolution helper.AMD64 Changes
FixupHijackedCallstack(Windows): Use R10/R11 instead of RDX/R8 for thread pointer and scratch, preserving all volatile argument registersFixupHijackedCallstack(Unix): Save/restore RSI, RDI, R8, R9, and xmm0-xmm7 aroundINLINE_GETTHREADusing CFI-awarepush_register/pop_register/alloc_stack/free_stackmacros; use R10 instead of R8 for scratchPUSH_PROBE_FRAME(Windows): Save/restore RDX, R8, R9 in correctm_PreservedRegsorder matchingStackFrameIteratorflag processing; save xmm0-xmm3 (was xmm0 only)PUSH_PROBE_FRAME(Unix): Save/restore RSI, RDI, RDX, R8, R9 in correctm_PreservedRegsorder matchingStackFrameIteratorflag processing (RSI/RDI placed between R12 and RBX) usingpush_registermacros; save xmm0-xmm7 (was xmm0-xmm1 only)POP_PROBE_FRAME(Windows): Restore RDX, R8, R9 and xmm0-xmm3POP_PROBE_FRAME(Unix): Restore RSI, RDI, RDX, R8, R9 and xmm0-xmm7 usingpop_register/pop_nonvol_reg/free_stackmacros for proper CFI unwind directivesPTFF_SAVE_RDX + PTFF_SAVE_R8 + PTFF_SAVE_R9PTFF_SAVE_RSI + PTFF_SAVE_RDI + PTFF_SAVE_RDX + PTFF_SAVE_R8 + PTFF_SAVE_R9PTFF_SAVE_RDX,PTFF_SAVE_R8,PTFF_SAVE_R9toAsmMacros.inc; addPTFF_SAVE_RSI,PTFF_SAVE_RDI,PTFF_SAVE_R8,PTFF_SAVE_R9tounixasmmacrosamd64.incARM64 Changes
FixupHijackedCallstack(Windows): Use x9/x10 instead of x4/x3 for thread pointer and scratch, preserving all volatile argument registers x0-x7FixupHijackedCallstack(Unix): UseINLINE_GETTHREAD x9(non-ETLS) or save/restore x0-x7 and q0-q7 aroundRhpGetThreadcall (ETLS path), preserving all argument registersPUSH_PROBE_FRAME: Save x0-x7 (was x0-x2) and q0-q7 (was q0-q3) in correctm_PreservedRegsorder matchingStackFrameIteratorflag processingPOP_PROBE_FRAME: Restore x0-x7 (was x0-x2) and q0-q7 (was q0-q3)PTFF_SAVE_X3 + PTFF_SAVE_X4 + PTFF_SAVE_X5 + PTFF_SAVE_X6 + PTFF_SAVE_X7PTFF_SAVE_X3throughPTFF_SAVE_X7tounixasmmacrosarm64.inc(WindowsAsmMacros.halready had them)JIT Changes
RBM_INTERFACELOOKUP_FOR_SLOT_TRASH(AMD64): Updated to(RBM_CALLEE_TRASH & ~(RBM_ARG_REGS | RBM_FLTARG_REGS))— trashes all non-argument volatile registers (integer, floating point, and mask), ensuring the JIT does not assume any non-argument volatile registers survive the interface resolution helperRBM_INTERFACELOOKUP_FOR_SLOT_TRASH(ARM64): Updated to(RBM_CALLEE_TRASH & ~(RBM_ARG_REGS | RBM_FLTARG_REGS))— trashes all non-argument volatile registers (integer x8-x15/IP0/IP1/LR, non-argument floating point, and mask registers)